Movie Rating Analysis¶

Movie ratings analysis is a process of examining and summarizing the ratings given to movies by audiences or critics. This analysis can provide valuable insights into what type of movies are popular, how well received they are, and what factors influence their ratings.The results of a movie ratings analysis can provide insight into what drives the success of a movie, what audiences are looking for in a movie, and what elements are most likely to generate positive ratings. By using this information, movie producers and marketers can create strategies to improve their chances of success in the industry.

In [10]:
##Now let’s get started with the task of movie rating analysis by importing the necessary Python libraries and the datasets:
import numpy as np
import pandas as pd
movies = pd.read_csv("F:/Swapnil/portfolio/projects portfolio/Movie Rating Analysis/movies.dat", delimiter='::')
print(movies.head())
   0000008      Edison Kinetoscopic Record of a Sneeze (1894)  \
0       10                La sortie des usines Lumière (1895)   
1       12                      The Arrival of a Train (1896)   
2       25  The Oxford and Cambridge University Boat Race ...   
3       91                         Le manoir du diable (1896)   
4      131                           Une nuit terrible (1896)   

     Documentary|Short  
0    Documentary|Short  
1    Documentary|Short  
2                  NaN  
3         Short|Horror  
4  Short|Comedy|Horror  
C:\Users\lenovo\AppData\Local\Temp\ipykernel_15788\1159868957.py:4: ParserWarning:

Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

In [11]:
##let’s define the column names:
movies.columns = ["ID", "Title", "Genre"]
print(movies.head())
    ID                                              Title                Genre
0   10                La sortie des usines Lumière (1895)    Documentary|Short
1   12                      The Arrival of a Train (1896)    Documentary|Short
2   25  The Oxford and Cambridge University Boat Race ...                  NaN
3   91                         Le manoir du diable (1896)         Short|Horror
4  131                           Une nuit terrible (1896)  Short|Comedy|Horror
In [12]:
###Now let’s import the ratings dataset:
ratings = pd.read_csv("F:/Swapnil/portfolio/projects portfolio/Movie Rating Analysis/ratings.dat", delimiter='::')
print(ratings.head())
C:\Users\lenovo\AppData\Local\Temp\ipykernel_15788\2500115637.py:2: ParserWarning:

Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

   1  0114508  8  1381006850
0  2   499549  9  1376753198
1  2  1305591  8  1376742507
2  2  1428538  1  1371307089
3  3    75314  1  1595468524
4  3   102926  9  1590148016
In [13]:
##let’s define the column names of this data also:
ratings.columns = ["User", "ID", "Ratings", "Timestamp"]
print(ratings.head())
   User       ID  Ratings   Timestamp
0     2   499549        9  1376753198
1     2  1305591        8  1376742507
2     2  1428538        1  1371307089
3     3    75314        1  1595468524
4     3   102926        9  1590148016
In [14]:
##Now I am going to merge these two datasets into one, these two datasets have a common column as ID, which contains movie ID, so we can use this column as the common column to merge the two datasets:
data = pd.merge(movies, ratings, on=["ID", "ID"])
print(data.head())
   ID                                              Title              Genre  \
0  10                La sortie des usines Lumière (1895)  Documentary|Short   
1  12                      The Arrival of a Train (1896)  Documentary|Short   
2  25  The Oxford and Cambridge University Boat Race ...                NaN   
3  91                         Le manoir du diable (1896)       Short|Horror   
4  91                         Le manoir du diable (1896)       Short|Horror   

    User  Ratings   Timestamp  
0  70577       10  1412878553  
1  69535       10  1439248579  
2  37628        8  1488189899  
3   5814        6  1385233195  
4  37239        5  1532347349  
In [17]:
## Let's have a look at the distribution of the ratings of all the movies given by the viewers:
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()
ratings = data["Ratings"].value_counts()
numbers = ratings.index
quantity = ratings.values
import plotly.express as px
fig = px.pie(data, values=quantity, names=numbers)
fig.show()

So, according to the pie chart above, most movies are rated 8 by users. From the above figure, it can be said that most of the movies are rated positively.

As 10 is the highest rating a viewer can give, let’s take a look at the top 10 movies that got 10 ratings by viewers:

In [16]:
data2 = data.query("Ratings == 10")
print(data2["Title"].value_counts().head(10))
Joker (2019)                       1479
Interstellar (2014)                1386
1917 (2019)                         820
Avengers: Endgame (2019)            812
The Shawshank Redemption (1994)     707
Gravity (2013)                      653
The Wolf of Wall Street (2013)      581
Hacksaw Ridge (2016)                570
Avengers: Infinity War (2018)       535
La La Land (2016)                   510
Name: Title, dtype: int64

So, according to this dataset, Joker (2019) got the highest number of 10 ratings from viewers.

In [ ]: